In this report, we will focus on studying the distances, Delta MES and MSR results of the splicing noise analysis of Ataxia RNAseq samples.
In this report, we will focus on studying the results of the mis-splicing noise analysis of Ataxia RNAseq samples: 46 Cerebellum and 45 Frontal Cortex samples. Samples will be pseudobulked by tissue, status disease and ataxia diagnosis, splitting the analysis in three different levels, including the results of the splicing noise between specific Ataxia diagnosis (i.e. SCA1, FRDA…) and control samples.
Now, the clusters always compare the same number of samples between cases and controls. Procedure explained in methodology.
Added results for Level 2 (AtaxiaSubtype) and references.
Changed the reference transcriptome from Gencode v43 to Gencode v39.
Updated methodology.
For every available BAM file to study, we apply the following steps:
Download and extraction of BAM files: the files are downloaded from s3://ataxia-bulk-rnaseq/nextflow_first_attemp/Star_2_pass_by_indv/output/STAR/align/BAM_files/ and subfolders.
Junction extraction: all junctions are extracted using regtools junction extract after sorting and indexing with samtools. A file is created for each BAM file in BED12 format.
Junction annotation: the junctions are read from the previously created files and merged into a single dataframe of read junctions. We also register the number of reads of each junction in every sample. The junctions located within the ENCODE blacklisted regions v2 are removed. The junction_annot() function from the package dasper is used to annotate the junctions to the Gencode v38 reference transcriptome. All junctions not classified as either novel_donor, novel_acceptor or annotated are ignored. We also remove all junctions smaller than 25bp (base pairs) and annotated introns that are ambiguously assigned to more than one gene.
Junction pairing: by looking for overlaps between the novel junctions and the annotated junctions for each sample, we measure the distance in bp between the novel and reference splice site. The annotated introns that are never associated to a novel junction are considered a never misspliced junctions.
Filtering the distances: we remove the pairings in which a novel junctions are associated to more than one reference intron across different samples. For more information about this process, please see the methods section in Introverse paper [1].
Next, we need to decide on a clustering method to combine and compare different samples. More information in section about clustering.
Measuring the mis-splicing ratio: by adding all novel junction read counts attached to an annotated intron across all samples in which the novel splice was observed, and then dividing by the total number of reads of the annotated intron and the novel junctions across the same set of samples, we obtain a measurement of the mis-splicing ratio for an given annotated intron at both the donor splice site and the acceptor splice site. For more information about the mis-splicing ratio, please see section MSR.
Generation of the DB: two tables are created per each cluster: db_introns and db_novel. Each one contains the relevant information related to reference introns (including the never misspliced introns) and novel junctions. This includes the MaxEntScan scores, the percentage of protein-coding transcripts and the classification in u2 and u12 introns.
In our dataset, we have a total of 95 samples, corresponding to 48 different individuals. A total of 4 samples are removed because they belong to individuals diagnosed as CANVAS and AIFM1.
Three different level of studies were studied in this report, and always different analyses for each tissue.
Level 1 (Type): whether the sample is diagnosed with ataxia or not (diases status). For the frontal cortex tissue, samples with \(RIN<4\) will be removed, while for the Cerebellum tissue only controls with \(RIN<=7\) are kept. This is to ensure non-significant differences in the RIN medians.
Level 2 (AtaxiaSubtype): two different analyses are performed: known ataxia cases vs. controls and unknown ataxia cases vs. controls. In both scenarios, no restrictions about RIN is required.
Level 3 (Diagnosis): a different analysis was performed for every ataxia diagnosis with at least three samples: FRDA, SCA1, SCA2 and SCA6. In all situations, control samples are selected to minimize a weighted Gower distance to the case samples (more information in following section).
Studies about the relationship in mis-splicing ratio’s median and the number of samples pseudobulked showed a clear correlation between the two. In order to avoid this effect in the comparisons between cases and controls, we decided to subsample the majority class until both classes have the same number of samples. The subsampling was performed so that the weighted Gower distance between the case samples and the control samples was minimized.
For datasets with both quantitative (i.e. RIN) and categorical (i.e. Brain bank) variables, also called mixed datasets, Gower’s distance is a common measurement of similarity between any two samples [2]. The similarity between samples can be defined as:
\[ S_{ij}=\frac{\sum_{k=1}^p s_{ijk}\delta_{ijk}w_k}{\sum_{k=1}^p \delta_{ijk}w_k} \] where \(p\) are the variables beeing compared, \(i\) and \(j\) refers to two different samples and:
We decided to apply a weighting to the variables in order to increase the relevance of the main contributors to the number of junction reads across samples.
The subsampling process was the following:
Variable weights: first, we divided the samples by tissue. Then, for each sample, we extract the number of reads associated to both annotated and novel junctions. Using a linear model to predict this number of reads, we measure the variance explained by each of the main covariates that will be employed in subsampling: RIN, PMI, Age at death, Brain bank and sex. The percentage of variance explained by each covariate will be considered as the weight in the next step.
Gower’s distance between samples: for each the minority class sample, the most similar majority class sample was selected without repetition (i.e. once two samples were assigned together, none of them can be selected again). Thus, the same number of samples between the two classes were obtained.
Wilcoxon test: between the two sets of samples, a Wilcoxon test is executed to test whether there are significant differences in the RIN medians. If significant differences are found, some extra restriction is applied to any of the sets. For example, for Cerebellum Level 1 study, control samples with RIN higher than 7 needed to be removed to ensure non-significant differences in the median.
The obtained weights are the following:
| Covariate | Cerebellum weights | Frontal Cortex weights |
|---|---|---|
| RIN | 0.989 | 0.855 |
| PMI | 0.001 | 0.132 |
| Brain Bank | 0.003 | 0.013 |
| Age at death | 0.001 | 0.000 |
| Sex | 0.006 | 0.000 |
For each study, we will only consider the common annotated introns between the two classes. To do so, we generate the following dataframes:
Common annotated intron table: we looped through both db_introns tables and extracted only the information from the common annotated introns in the clusters. To identify common annotated introns, we used their locus (i.e. seqname:start-end:strand), since it is a unique identifier. The goal is to have the same number of annotated between cases and controls.
Common novel junction table: we looped through both db_novel tables and extracted only the information from the novel junctions associated to common annotated introns. Thus, we first needed to calculate the common annotated intron table.
The examples in this section corresponds to the Level 1 study for Cerebellum.
The distances graph is generated by counting the number of unique novel junctions at any given distance of the reference splicing site. From the list of novel junctions associated to common reference introns, we group by novel_type and cluster, count the number of entries for each distance within 30 bp into both intron and exon sequence, and represent the data in a histogram. An example of the data can be seen in the following table:
| Novel jun. ID | Novel type | Cluster | Sequence | Distance [bp] |
|---|---|---|---|---|
| 1900408 | novel_acceptor | Control | intron | -26 |
| 262172 | novel_acceptor | Case | exon | 4 |
| 1562780 | novel_acceptor | Case | exon | 75 |
| 1744154 | novel_acceptor | Control | exon | 26 |
| 2006682 | novel_acceptor | Case | exon | 84 |
| 315218 | novel_donor | Control | exon | 3 |
| 1123582 | novel_acceptor | Control | intron | -22 |
| 149464 | novel_acceptor | Case | exon | 89 |
| 951814 | novel_donor | Case | exon | 10 |
| 446245 | novel_acceptor | Case | exon | 29 |
For each study, different graphical representations will be presented. First, we present a histogram of the different distance values between -30 and 30 bp, with the case bars stacked on top of the control bars:
However, this representation does not allow for an easy comparison between cases and controls because of the bar stacking. To do so, we split the graph in two more facets to represent each cluster by its own:
If we focus on the difference in the number of unique novel junctions between cases and controls, the last graph represent the histogram resulting from subtracting the control distances from the case distances:
To generate the modulo 3 graphs, we first filter by those distances smaller than 100 bp in either direction. Then, we group by novel_type and cluster, and calculate the distance in modulo 3 (i.e. the remainder from the division of the distance by 3). Once we have all novel junctions split in the four different categories with their distances in modulo 3, we calculate the percentage that each modulo 3 represents in each category. Thus, for each category:
\[ \text{mod.0 }\% = \frac{\# 0}{\#0 + \#1 + \#2}*100\%\qquad\quad \text{mod.1 }\% = \frac{\# 1}{\#0 + \#1 + \#2}*100\%\qquad\quad \text{mod.2 }\% = \frac{\# 2}{\#0 + \#1 + \#2}*100\% \]
Here, \(\#X\) represents the number of novel junctions at a modulo 3 distance equal to \(X\). For example, from the information of the following table, we can focus on the Case acceptor group, where we observe two 0s, two 1s and one 2:
| Novel jun. ID | Novel type | Cluster | Sequence | Distance [bp] | Mod3 distance [bp] |
|---|---|---|---|---|---|
| Case acceptor | |||||
| 262172 | novel_acceptor | Case | exon | 4 | 1 |
| 1562780 | novel_acceptor | Case | exon | 75 | 0 |
| 2006682 | novel_acceptor | Case | exon | 84 | 0 |
| 149464 | novel_acceptor | Case | exon | 89 | 2 |
| 446245 | novel_acceptor | Case | exon | 29 | 2 |
| Case donor | |||||
| 951814 | novel_donor | Case | exon | 10 | 1 |
| Control acceptor | |||||
| 1900408 | novel_acceptor | Control | intron | -26 | 2 |
| 1744154 | novel_acceptor | Control | exon | 26 | 2 |
| 1123582 | novel_acceptor | Control | intron | -22 | 1 |
| Control donor | |||||
| 315218 | novel_donor | Control | exon | 3 | 0 |
Thus, the calculated percentage are:
\[ \text{mod.0 }\% = 40\%\qquad\quad \text{mod.1 }\% = 40\%\qquad\quad \text{mod.2 }\% = 20\% \]
Additionally, we can further group the novel junctions by their intron/exon sequence, resulting a total of 8 categories, where the same calculation of percentages can be applied:
| Novel jun. ID | Novel type | Cluster | Sequence | Distance [bp] | Mod3 distance [bp] |
|---|---|---|---|---|---|
| Case acceptor exon | |||||
| 262172 | novel_acceptor | Case | exon | 4 | 1 |
| 1562780 | novel_acceptor | Case | exon | 75 | 0 |
| 2006682 | novel_acceptor | Case | exon | 84 | 0 |
| 149464 | novel_acceptor | Case | exon | 89 | 2 |
| 446245 | novel_acceptor | Case | exon | 29 | 2 |
| Case donor exon | |||||
| 951814 | novel_donor | Case | exon | 10 | 1 |
| Control acceptor exon | |||||
| 1744154 | novel_acceptor | Control | exon | 26 | 2 |
| Control acceptor intron | |||||
| 1900408 | novel_acceptor | Control | intron | -26 | 2 |
| 1123582 | novel_acceptor | Control | intron | -22 | 1 |
| Control donor exon | |||||
| 315218 | novel_donor | Control | exon | 3 | 0 |
With the previous tables in mind, we represent them in three different ways. The first one just represents the number of unique novel junctions for each modulo 3 distance where \(abs(distance) <= 100\).
We also represent the percentage that each that each distance in modulo 3 represents when grouped by novel_type (i.e. acceptor/donor) and cluster (i.e. case/control).
Lastly, we represent the same data as before but taking into consideration whether it is located in the intronic or exonic sequence:
To represent the difference in MaxEntScan scores (MES), we subtract the MaxEntScan score of the annotated intron to the MaxEntScan score of a novel junction at each splice site. We obtain a table like the following, and represent the density plot of the different values for delta_ss5score and delta_ss3score:
\[ \text{Delta MES}^{5'} = \text{MES}_{ref}^{5'}-\text{MES}_{novel}^{5'}\qquad\qquad\text{Delta MES}^{3'} = \text{MES}_{ref}^{3'}-\text{MES}_{novel}^{3'} \]
Example table with the Delta MES data:
| Novel jun. ID | Ref. jun. ID | Delta ss5score | Delta ss3score |
|---|---|---|---|
| 2035214 | 2035210 | 2.63 | 3.96 |
| 89328 | 89327 | 6.31 | -0.30 |
| 67502 | 67507 | 3.53 | -0.88 |
| 381357 | 381360 | -2.17 | 0.00 |
| 1547859 | 1547861 | 2.25 | 1.54 |
| 1151075 | 1151074 | -0.76 | 0.00 |
| 1877181 | 1877182 | 22.15 | 22.18 |
| 704572 | 704556 | 4.77 | 11.05 |
| 1533950 | 1533949 | 2.09 | 1.97 |
| 806079 | 806085 | 0.05 | 0.00 |
As for the visual representations, the difference between the MaxEntScan scores for each splice site is shown in the X-axis. The Y-axis represents the kernel density estimate (smoothed version of a histogram). The fill color represents cases vs. controls.
Additionally, we can also represent the difference in the kernel density estimate between case and control:
The Mis-Splicing Ratio (MSR) is calculated in the data-analysis pipeline for each annotated intron found in the project samples. More information about the mis-splicing ratio can be found in the Ataxia RNAseq Pseudobulk - Median MSR Analaysis report. To visualize this data, we represent the density plot for MSR at the different splice sites (i.e. donor or acceptor). Example table with the MSR data:
| Ref. jun, ID | Cluster | Ref. type | MSR Donor | MSR Acceptor |
|---|---|---|---|---|
| 1331421 | Control | never | 0.000 | 0.000 |
| 1972189 | Control | both | 0.059 | 0.004 |
| 414325 | Case | acceptor | 0.000 | 0.033 |
| 750187 | Control | never | 0.000 | 0.000 |
| 1996411 | Case | never | 0.000 | 0.000 |
| 573131 | Case | donor | 0.004 | 0.000 |
| 253464 | Case | never | 0.000 | 0.000 |
| 448971 | Control | never | 0.000 | 0.000 |
| 403929 | Case | never | 0.000 | 0.000 |
| 1503717 | Control | never | 0.000 | 0.000 |
We represent both the distribution of mis-splicing ratio (MSR) at the donor site and the acceptor site. The X-axis represents the MSR, while the Y-axis represents the kernel density estimate.
## <br><br><br>
Note that the distributions have a discontinuity at \(x=0.05\) to better represent all possible MSR values. In some cases, a discontinuity in the Y-axis will also be present if the difference between the two sample types exceeds a predefined threshold.
Under the Stats subsection, we will study different statistics of the annotated introns and novel junctions:
Unique annotated introns: number of annotated introns classified as never mis-spliced, mis-spliced at acceptor end, donor end or both. It is represented by cluster, so that we can study the variation in the number and percentage of never mis-spliced annotated introns.
Reads - annotated introns: total reads associated to annotated introns and the percentage that they represent against the total read depth.
Reads - novel junctions: total reads associated to novel junctions and the percentage that they represent against the total read depth. Results are separated by cluster.
Distribution of sample RIN for Cerebellum level 1 study:
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Case | 198167 | 40435 | 20.40 |
| acceptor | Control | 198167 | 39710 | 20.04 |
| both | Case | 198167 | 33845 | 17.08 |
| both | Control | 198167 | 34144 | 17.23 |
| donor | Case | 198167 | 21951 | 11.08 |
| donor | Control | 198167 | 21934 | 11.07 |
| never | Case | 198167 | 101936 | 51.44 |
| never | Control | 198167 | 102379 | 51.66 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 198167 | 4052024612 | 423466847 | 10.45 |
| Control | 198167 | 4520672127 | 521789300 | 11.54 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 234748 | 136989 | 4052024612 | 2355486 | 0.06 |
| Case | novel_donor | 234748 | 97759 | 4052024612 | 1700082 | 0.04 |
| Control | novel_acceptor | 238281 | 137983 | 4520672127 | 2803847 | 0.06 |
| Control | novel_donor | 238281 | 100298 | 4520672127 | 2003931 | 0.04 |
Distribution of sample RIN for Frontal Cortex level 1 study:
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Case | 212490 | 42989 | 20.23 |
| acceptor | Control | 212490 | 42227 | 19.87 |
| both | Case | 212490 | 38202 | 17.98 |
| both | Control | 212490 | 37676 | 17.73 |
| donor | Case | 212490 | 25739 | 12.11 |
| donor | Control | 212490 | 25323 | 11.92 |
| never | Case | 212490 | 105560 | 49.68 |
| never | Control | 212490 | 107264 | 50.48 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 212490 | 4769291070 | 854565112 | 17.92 |
| Control | 212490 | 5300507311 | 862005120 | 16.26 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 270355 | 152012 | 4769291070 | 2948776 | 0.06 |
| Case | novel_donor | 270355 | 118343 | 4769291070 | 2176172 | 0.05 |
| Control | novel_acceptor | 265070 | 147966 | 5300507311 | 3030018 | 0.06 |
| Control | novel_donor | 265070 | 117104 | 5300507311 | 2263035 | 0.04 |
Distribution of sample RIN for Cerebellum level 2 study:
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 195965 | 38514 | 19.65 |
| acceptor | KnownAtaxia | 195965 | 36451 | 18.60 |
| both | Control | 195965 | 31939 | 16.30 |
| both | KnownAtaxia | 195965 | 27608 | 14.09 |
| donor | Control | 195965 | 21431 | 10.94 |
| donor | KnownAtaxia | 195965 | 21444 | 10.94 |
| never | Control | 195965 | 104081 | 53.11 |
| never | KnownAtaxia | 195965 | 110462 | 56.37 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 195965 | 3448881421 | 324372917 | 9.41 |
| Control | 195965 | 3977174051 | 432584614 | 10.88 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 189450 | 108616 | 3448881421 | 1741540 | 0.05 |
| Case | novel_donor | 189450 | 80834 | 3448881421 | 1260992 | 0.04 |
| Control | novel_acceptor | 221459 | 128194 | 3977174051 | 2379406 | 0.06 |
| Control | novel_donor | 221459 | 93265 | 3977174051 | 1688372 | 0.04 |
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 182348 | 26187 | 14.36 |
| acceptor | UnknownAtaxia | 182348 | 31827 | 17.45 |
| both | Control | 182348 | 14442 | 7.92 |
| both | UnknownAtaxia | 182348 | 19905 | 10.92 |
| donor | Control | 182348 | 16653 | 9.13 |
| donor | UnknownAtaxia | 182348 | 17614 | 9.66 |
| never | Control | 182348 | 125066 | 68.59 |
| never | UnknownAtaxia | 182348 | 113002 | 61.97 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 182348 | 1723566342 | 159025954 | 9.23 |
| Control | 182348 | 1693165552 | 136668669 | 8.07 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 137582 | 80978 | 1723566342 | 856784 | 0.05 |
| Case | novel_donor | 137582 | 56604 | 1723566342 | 612918 | 0.04 |
| Control | novel_acceptor | 102863 | 58382 | 1693165552 | 669761 | 0.04 |
| Control | novel_donor | 102863 | 44481 | 1693165552 | 458738 | 0.03 |
Distribution of sample RIN for Frontal Cortex level 2 study:
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 209708 | 40473 | 19.30 |
| acceptor | KnownAtaxia | 209708 | 41017 | 19.56 |
| both | Control | 209708 | 34187 | 16.30 |
| both | KnownAtaxia | 209708 | 34357 | 16.38 |
| donor | Control | 209708 | 24851 | 11.85 |
| donor | KnownAtaxia | 209708 | 25050 | 11.95 |
| never | Control | 209708 | 110197 | 52.55 |
| never | KnownAtaxia | 209708 | 109284 | 52.11 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 209708 | 3572908572 | 665252255 | 18.62 |
| Control | 209708 | 4100844167 | 702866511 | 17.14 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 241427 | 134997 | 3572908572 | 2283057 | 0.06 |
| Case | novel_donor | 241427 | 106430 | 3572908572 | 1709242 | 0.05 |
| Control | novel_acceptor | 239218 | 133056 | 4100844167 | 2464214 | 0.06 |
| Control | novel_donor | 239218 | 106162 | 4100844167 | 1849022 | 0.05 |
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 195328 | 31309 | 16.03 |
| acceptor | UnknownAtaxia | 195328 | 30651 | 15.69 |
| both | Control | 195328 | 19848 | 10.16 |
| both | UnknownAtaxia | 195328 | 17533 | 8.98 |
| donor | Control | 195328 | 20335 | 10.41 |
| donor | UnknownAtaxia | 195328 | 19693 | 10.08 |
| never | Control | 195328 | 123836 | 63.40 |
| never | UnknownAtaxia | 195328 | 127451 | 65.25 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 195328 | 1659144178 | 253102013 | 15.25 |
| Control | 195328 | 1765693058 | 293450631 | 16.62 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 126819 | 71270 | 1659144178 | 833338 | 0.05 |
| Case | novel_donor | 126819 | 55549 | 1659144178 | 618577 | 0.04 |
| Control | novel_acceptor | 140907 | 78017 | 1765693058 | 972618 | 0.06 |
| Control | novel_donor | 140907 | 62890 | 1765693058 | 760132 | 0.04 |
Distribution of sample RIN for Cerebellum level 3 study:
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 176181 | 23705 | 13.45 |
| acceptor | FRDA | 176181 | 24542 | 13.93 |
| both | Control | 176181 | 12569 | 7.13 |
| both | FRDA | 176181 | 13221 | 7.50 |
| donor | Control | 176181 | 15157 | 8.60 |
| donor | FRDA | 176181 | 15665 | 8.89 |
| never | Control | 176181 | 124750 | 70.81 |
| never | FRDA | 176181 | 122753 | 69.67 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 176181 | 896665067 | 89994863 | 10.04 |
| Control | 176181 | 937441018 | 94447687 | 10.08 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 93579 | 53271 | 896665067 | 527632 | 0.06 |
| Case | novel_donor | 93579 | 40308 | 896665067 | 377082 | 0.04 |
| Control | novel_acceptor | 89193 | 50708 | 937441018 | 508712 | 0.05 |
| Control | novel_donor | 89193 | 38485 | 937441018 | 349410 | 0.04 |
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 168115 | 19582 | 11.65 |
| acceptor | SCA1 | 168115 | 18882 | 11.23 |
| both | Control | 168115 | 9200 | 5.47 |
| both | SCA1 | 168115 | 8590 | 5.11 |
| donor | Control | 168115 | 12736 | 7.58 |
| donor | SCA1 | 168115 | 13153 | 7.82 |
| never | Control | 168115 | 126597 | 75.30 |
| never | SCA1 | 168115 | 127490 | 75.83 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 168115 | 663185400 | 55447132 | 8.36 |
| Control | 168115 | 727022405 | 60692971 | 8.35 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 64141 | 35767 | 663185400 | 281691 | 0.04 |
| Case | novel_donor | 64141 | 28374 | 663185400 | 210850 | 0.03 |
| Control | novel_acceptor | 67003 | 38156 | 727022405 | 327377 | 0.05 |
| Control | novel_donor | 67003 | 28847 | 727022405 | 233144 | 0.03 |
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 159886 | 13505 | 8.45 |
| acceptor | SCA2 | 159886 | 14135 | 8.84 |
| both | Control | 159886 | 4523 | 2.83 |
| both | SCA2 | 159886 | 4381 | 2.74 |
| donor | Control | 159886 | 9123 | 5.71 |
| donor | SCA2 | 159886 | 9079 | 5.68 |
| never | Control | 159886 | 132735 | 83.02 |
| never | SCA2 | 159886 | 132291 | 82.74 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 159886 | 584147295 | 38935749 | 6.67 |
| Control | 159886 | 710683525 | 39604626 | 5.57 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 39416 | 22961 | 584147295 | 182180 | 0.03 |
| Case | novel_donor | 39416 | 16455 | 584147295 | 132288 | 0.02 |
| Control | novel_acceptor | 38531 | 21928 | 710683525 | 169624 | 0.02 |
| Control | novel_donor | 38531 | 16603 | 710683525 | 118165 | 0.02 |
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 170971 | 20916 | 12.23 |
| acceptor | SCA6 | 170971 | 19858 | 11.61 |
| both | Control | 170971 | 10418 | 6.09 |
| both | SCA6 | 170971 | 8296 | 4.85 |
| donor | Control | 170971 | 13521 | 7.91 |
| donor | SCA6 | 170971 | 12953 | 7.58 |
| never | Control | 170971 | 126116 | 73.76 |
| never | SCA6 | 170971 | 129864 | 75.96 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 170971 | 616156554 | 67782298 | 11.00 |
| Control | 170971 | 712278697 | 68984361 | 9.69 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 64023 | 36427 | 616156554 | 316179 | 0.05 |
| Case | novel_donor | 64023 | 27596 | 616156554 | 220631 | 0.04 |
| Control | novel_acceptor | 74679 | 42471 | 712278697 | 403422 | 0.06 |
| Control | novel_donor | 74679 | 32208 | 712278697 | 275237 | 0.04 |
Distribution of sample RIN for Frontal Cortex level 3 study:
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 190016 | 28043 | 14.76 |
| acceptor | FRDA | 190016 | 26193 | 13.78 |
| both | Control | 190016 | 16142 | 8.50 |
| both | FRDA | 190016 | 13383 | 7.04 |
| donor | Control | 190016 | 18596 | 9.79 |
| donor | FRDA | 190016 | 17677 | 9.30 |
| never | Control | 190016 | 127235 | 66.96 |
| never | FRDA | 190016 | 132763 | 69.87 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 190016 | 866370018 | 155141061 | 17.91 |
| Control | 190016 | 1043087110 | 193646726 | 18.56 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 100061 | 55394 | 866370018 | 538635 | 0.06 |
| Case | novel_donor | 100061 | 44667 | 866370018 | 402906 | 0.05 |
| Control | novel_acceptor | 115374 | 63869 | 1043087110 | 696453 | 0.07 |
| Control | novel_donor | 115374 | 51505 | 1043087110 | 503840 | 0.05 |
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 185853 | 25038 | 13.47 |
| acceptor | SCA1 | 185853 | 23801 | 12.81 |
| both | Control | 185853 | 12990 | 6.99 |
| both | SCA1 | 185853 | 11645 | 6.27 |
| donor | Control | 185853 | 17029 | 9.16 |
| donor | SCA1 | 185853 | 16604 | 8.93 |
| never | Control | 185853 | 130796 | 70.38 |
| never | SCA1 | 185853 | 133803 | 71.99 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 185853 | 698071292 | 143266471 | 20.52 |
| Control | 185853 | 785114845 | 148634904 | 18.93 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 88195 | 48508 | 698071292 | 453261 | 0.06 |
| Case | novel_donor | 88195 | 39687 | 698071292 | 345121 | 0.05 |
| Control | novel_acceptor | 95396 | 52532 | 785114845 | 486387 | 0.06 |
| Control | novel_donor | 95396 | 42864 | 785114845 | 370534 | 0.05 |
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 183050 | 24450 | 13.36 |
| acceptor | SCA2 | 183050 | 22490 | 12.29 |
| both | Control | 183050 | 12391 | 6.77 |
| both | SCA2 | 183050 | 9805 | 5.36 |
| donor | Control | 183050 | 16802 | 9.18 |
| donor | SCA2 | 183050 | 14962 | 8.17 |
| never | Control | 183050 | 129407 | 70.69 |
| never | SCA2 | 183050 | 135793 | 74.18 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 183050 | 645767566 | 117766633 | 18.24 |
| Control | 183050 | 762434189 | 141342187 | 18.54 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 77025 | 43272 | 645767566 | 380720 | 0.06 |
| Case | novel_donor | 77025 | 33753 | 645767566 | 283002 | 0.04 |
| Control | novel_acceptor | 92338 | 50811 | 762434189 | 474444 | 0.06 |
| Control | novel_donor | 92338 | 41527 | 762434189 | 368842 | 0.05 |
## <br><br><br>
| Mis-spliced site | Cluster | # Annotated introns | # Ann. introns by splice site | Percent [%] |
|---|---|---|---|---|
| acceptor | Control | 183595 | 23996 | 13.07 |
| acceptor | SCA6 | 183595 | 24755 | 13.48 |
| both | Control | 183595 | 11435 | 6.23 |
| both | SCA6 | 183595 | 13080 | 7.12 |
| donor | Control | 183595 | 16185 | 8.82 |
| donor | SCA6 | 183595 | 16434 | 8.95 |
| never | Control | 183595 | 131979 | 71.89 |
| never | SCA6 | 183595 | 129326 | 70.44 |
| Cluster | # Annotated introns | # Reads in cluster | # Ann. intron reads | Percentage [%] |
|---|---|---|---|---|
| Case | 183595 | 669517733 | 120694740 | 18.03 |
| Control | 183595 | 782827004 | 126120311 | 16.11 |
| Cluster | Novel type | # Novel junctions | # Novel junc. in splice site | # Reads in cluster | # Novel junc. reads | Percentage [%] |
|---|---|---|---|---|---|---|
| Case | novel_acceptor | 93862 | 52528 | 669517733 | 423931 | 0.06 |
| Case | novel_donor | 93862 | 41334 | 669517733 | 318321 | 0.05 |
| Control | novel_acceptor | 85458 | 47409 | 782827004 | 418368 | 0.05 |
| Control | novel_donor | 85458 | 38049 | 782827004 | 317316 | 0.04 |
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 4.2.1 (2022-06-23)
## os Ubuntu 20.04.4 LTS
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Etc/UTC
## date 2023-06-05
## pandoc 2.18 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## abind 1.4-5 2016-07-21 [1] RSPM (R 4.2.0)
## backports 1.4.1 2021-12-13 [1] RSPM (R 4.2.0)
## bit 4.0.5 2022-11-15 [1] RSPM (R 4.2.0)
## bit64 4.0.5 2020-08-30 [1] RSPM (R 4.2.0)
## bookdown 0.33 2023-03-06 [1] RSPM (R 4.2.0)
## broom 1.0.4 2023-03-11 [1] RSPM (R 4.2.0)
## bslib 0.4.2 2022-12-16 [1] RSPM (R 4.2.0)
## cachem 1.0.8 2023-05-01 [1] RSPM (R 4.2.0)
## car 3.1-2 2023-03-30 [1] RSPM (R 4.2.0)
## carData 3.0-5 2022-01-06 [1] RSPM (R 4.2.0)
## cli 3.6.1 2023-03-23 [1] RSPM (R 4.2.0)
## codetools 0.2-19 2023-02-01 [1] RSPM (R 4.2.0)
## colorspace 2.1-0 2023-01-23 [1] RSPM (R 4.2.0)
## crayon 1.5.2 2022-09-29 [1] RSPM (R 4.2.0)
## DBI 1.1.3 2022-06-18 [1] RSPM (R 4.2.0)
## digest 0.6.31 2022-12-11 [1] RSPM (R 4.2.0)
## doParallel * 1.0.17 2022-02-07 [1] RSPM (R 4.2.0)
## dplyr * 1.1.2 2023-04-20 [1] RSPM (R 4.2.0)
## evaluate 0.21 2023-05-05 [1] RSPM (R 4.2.0)
## fansi 1.0.4 2023-01-22 [1] RSPM (R 4.2.0)
## farver 2.1.1 2022-07-06 [1] RSPM (R 4.2.0)
## fastmap 1.1.1 2023-02-24 [1] RSPM (R 4.2.0)
## forcats * 1.0.0 2023-01-29 [1] RSPM (R 4.2.0)
## foreach * 1.5.2 2022-02-02 [1] RSPM (R 4.2.0)
## generics 0.1.3 2022-07-05 [1] RSPM (R 4.2.0)
## ggforce 0.4.1 2022-10-04 [1] RSPM (R 4.2.0)
## ggplot2 * 3.4.2 2023-04-03 [1] RSPM (R 4.2.0)
## ggpubr 0.6.0 2023-02-10 [1] RSPM (R 4.2.0)
## ggrepel 0.9.3 2023-02-03 [1] RSPM (R 4.2.0)
## ggsignif 0.6.4 2022-10-13 [1] RSPM (R 4.2.0)
## glue * 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
## gridExtra 2.3 2017-09-09 [1] RSPM (R 4.2.0)
## gtable 0.3.3 2023-03-21 [1] RSPM (R 4.2.0)
## here * 1.0.1 2020-12-13 [1] RSPM (R 4.2.0)
## highr 0.10 2022-12-22 [1] RSPM (R 4.2.0)
## hms 1.1.3 2023-03-21 [1] RSPM (R 4.2.0)
## htmltools 0.5.5 2023-03-23 [1] RSPM (R 4.2.0)
## httr 1.4.5 2023-02-24 [1] RSPM (R 4.2.0)
## iterators * 1.0.14 2022-02-05 [1] RSPM (R 4.2.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.0)
## jsonlite 1.8.4 2022-12-06 [1] RSPM (R 4.2.0)
## kableExtra 1.3.4.9000 2023-01-30 [1] Github (haozhu233/kableExtra@292f607)
## knitr 1.43 2023-05-25 [1] RSPM (R 4.2.0)
## labeling 0.4.2 2020-10-20 [1] RSPM (R 4.2.0)
## lattice 0.21-8 2023-04-05 [1] RSPM (R 4.2.0)
## lifecycle 1.0.3 2022-10-07 [1] RSPM (R 4.2.0)
## lpSolve 5.6.18 2023-02-01 [1] RSPM (R 4.2.0)
## lubridate * 1.9.2 2023-02-10 [1] RSPM (R 4.2.0)
## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
## MASS 7.3-59 2023-04-21 [1] RSPM (R 4.2.0)
## Matrix 1.5-4 2023-04-04 [1] RSPM (R 4.2.0)
## mitools 2.4 2019-04-26 [1] RSPM (R 4.2.0)
## munsell 0.5.0 2018-06-12 [1] RSPM (R 4.2.0)
## pillar 1.9.0 2023-03-22 [1] RSPM (R 4.2.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
## polyclip 1.10-4 2022-10-20 [1] RSPM (R 4.2.0)
## proxy 0.4-27 2022-06-09 [1] RSPM (R 4.2.0)
## purrr * 1.0.1 2023-01-10 [1] RSPM (R 4.2.0)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
## Rcpp 1.0.10 2023-01-22 [1] RSPM (R 4.2.0)
## readr * 2.1.4 2023-02-10 [1] RSPM (R 4.2.0)
## rlang 1.1.1 2023-04-28 [1] RSPM (R 4.2.0)
## rmarkdown 2.21 2023-03-26 [1] RSPM (R 4.2.0)
## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)
## rstatix 0.7.2 2023-02-01 [1] RSPM (R 4.2.0)
## rstudioapi 0.14 2022-08-22 [1] RSPM (R 4.2.0)
## rvest 1.0.3 2022-08-19 [1] RSPM (R 4.2.0)
## sass 0.4.6 2023-05-03 [1] RSPM (R 4.2.0)
## scales * 1.2.1 2022-08-20 [1] RSPM (R 4.2.0)
## sciRmdTheme 0.3 2023-03-10 [1] local
## sessioninfo * 1.2.2 2021-12-06 [1] RSPM (R 4.2.0)
## StatMatch 1.4.1 2022-03-01 [1] RSPM (R 4.2.0)
## stringi 1.7.12 2023-01-11 [1] RSPM (R 4.2.0)
## stringr * 1.5.0 2022-12-02 [1] RSPM (R 4.2.0)
## survey 4.2-1 2023-05-03 [1] RSPM (R 4.2.0)
## survival 3.5-5 2023-03-12 [1] RSPM (R 4.2.0)
## svglite 2.1.1 2023-01-10 [1] RSPM (R 4.2.0)
## systemfonts 1.0.4 2022-02-11 [1] RSPM (R 4.2.0)
## tibble * 3.2.1 2023-03-20 [1] RSPM (R 4.2.0)
## tidyr * 1.3.0 2023-01-24 [1] RSPM (R 4.2.0)
## tidyselect 1.2.0 2022-10-10 [1] RSPM (R 4.2.0)
## tidyverse * 2.0.0 2023-02-22 [1] RSPM (R 4.2.0)
## timechange 0.2.0 2023-01-11 [1] RSPM (R 4.2.0)
## tweenr 2.0.2 2022-09-06 [1] RSPM (R 4.2.0)
## tzdb 0.3.0 2022-03-28 [1] RSPM (R 4.2.0)
## utf8 1.2.3 2023-01-31 [1] RSPM (R 4.2.0)
## vctrs 0.6.2 2023-04-19 [1] RSPM (R 4.2.0)
## viridis 0.6.3 2023-05-03 [1] RSPM (R 4.2.0)
## viridisLite 0.4.2 2023-05-02 [1] RSPM (R 4.2.0)
## vroom 1.6.3 2023-04-28 [1] RSPM (R 4.2.0)
## webshot 0.5.4 2022-09-26 [1] RSPM (R 4.2.0)
## withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
## xfun 0.39 2023-04-20 [1] RSPM (R 4.2.0)
## xml2 1.3.4 2023-04-27 [1] RSPM (R 4.2.0)
## yaml 2.3.7 2023-01-23 [1] RSPM (R 4.2.0)
##
## [1] /usr/local/lib/R/site-library
## [2] /usr/local/lib/R/library
##
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────